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Abstract 

The continuous advances in data collection and storage techniques allow us to observe and 
record real-life processes in great detail. Examples include financial transaction data, fMRI 
images, satellite photos, earths pollution distribution in time etc. Due to the high dimensionality 
of such data, classical statistical tools become inadequate and inefficient. The need for new 
methods emerges and one of the most prominent techniques in this context is functional data 
analysis (FDA). 

The main objective of this article is to present techniques of the analysis of temporal depen¬ 
dence in FDA. Such dependence occurs, for example, if the data consist of a continuous time 
process which has been cut into segments, days for instance. We are then in the context of 
so-called functional time series. 


1 Introduction 

In this article we introduce foundations of functional time series and the frequency-domain analysis 
in this context. We address the article to larger audience, assuming only elementary knowledge of 
probability theory and algebra. Although, we try to keep the text accurate, in some fragments we 
sacrifice detailed investigation for intuitive argumentation, referring advanced readers to appropriate 
sources. 

The manuscript is divided into two parts. In the first part, we introduce concepts from statistics 
and functional data analysis (FDA). We built upon basic ideas about continuous functions and prob¬ 
ability. In the second part, we present some state-of-the-art results in linear models for functional 
objects. 

1.1 Motivation for statistics on functional data 

The main concern of statistics is to obtain essential information from a sample of observations 
X\, X-2,Xn from some space of objects. We are given a finite sample of size N G IV, where 
{Aqjiez can be scalars (like heights of a sample of students in a school), vectors (like points on a 
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target after throwing several darts) or more complex objects, like genotypes, fMRI scans, images or 
frames of a video. 

Functional data analysis deals with observations which can be naturally expressed as functions. 
Figures [TJ [2] and [3] present several cases, from various areas of science, which fit into the framework 
of functional data analysis. 

When we deal with a physical process it is often natural to assume that it behaves in a con¬ 
tinues manner and that the observations do not oscillate significantly between the measurements. 
Although, in the Digital Age, we rarely record analog processes continuously, we often have enough 
datapoints, so that interpolation doesn’t cause a significant measurement error. Models incorporat¬ 
ing this additional structure can lead to more precise and meaningful foundings. In this context, 
FDA can be seen as a tool which embeds the continuity feature into the model. 

Except for good approximation of a continuous process, FDA can also prove to be useful in a 
noisy, discontinuous case. Then, FDA can serve as a tool for denoising and smoothing the data and 
is beneficial whenever the underlying process is the main concern, like, for example, in finance. 

From a pragmatic perspective, functional data can be seen simply as infinitely dimensional 
vectors, with extended notion of variance and mean, and thus we may be tempted to employ classical 
multivariate techniques. However, there are many practical and theoretical problems that need to 
be addressed and this approach is not advised, as we will see later. 

The FDA approach is also useful in a parsimonious representation of the data by taking advantage 
of their smoothness. Instead of looking at a function as a dense vector of values, we can often 
represent it in an linear combination of a handful of (well-chosen) basis functions. 

Practical applications of functional data analysis are spread across many areas of science and 
engineering. Panaretos et al. [IT] use [0,1] -» i? 3 closed curves to analyze the behavior of DNA 
microcircles, providing the testing methodology for the comparison of two classes of curves. Aston 
and Kirch |1] analyze the stationarity and change point detection for functional time series, with 
applications to fMRI data. Hadjipantelis et al. [5] analyze Mandarin language using functional prin¬ 
cipal components. Functional time series also naturally emerge in financial applications - Kokoszka 
and Reimherr [Bj analyze predictability of the shape of intraday price curves. These works are only 
a fraction of the ongoing research and for a more accurate survey on applications and theory we 
refer to books m, @3, M and [2]. 

1.2 Hilbert spaces 

For most of the results presented in this work we only require a separable Hilbert space , a linear 
metric space with the norm function induced by the inner product and with a countable basi^j It 
makes the setup very general, but for simplicity, and in order to give an intuitive example to each 
of the results, in most cases we will assume a concrete space of square-integrable functions on a 
bounded interval [0,1], to which we will refer to as L 2 . A function / : R —> R belongs to L 2 if and 

1 In this work we introduce Hilbert spaces in an elementary and accessible way, avoiding technical details. For a 
formal definitions and investigation of key properties refer to m- 
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Figure 1: Berkeley Growth Data: Heights of 20 girls taken from ages 0 through 18 (left). Growth 
process easier to visualize in terms of acceleration (right). Tuddenham and Snyder [llj and Ramsey 
and Silverman [ 12] 





Milliseconds 


Figure 2: Lower lip movement (top), acceleration (middle) and EMG of a facial muscle (bottom) of 
a speaker pronouncing the syllable “bob” for 32 replications. Malfait, Ramsay, and Froda 0 
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Figure 3: Projections of DNA minicircles on the planes given by the principal axes of inertia (three 
panels on the left side: TATA curves, right: CAP curves). Mean curves are plotted in white. 
Panaretos, Kraus and Maddocks HU 
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only if 


f 2 (x)dx < oo. 


In this section we present elementary properties of Hilbert spaces. 

With a space L 2 , we associate an inner product, a bilinear operator L 2 x L 2 —> R■ For two 
functions f, g G L 2 , we define the inner product as 

( f,g)=[ f(x)g(x)dx. 

Jo 

We define the norm of the element / e L 2 as ||/|| = \/\(f, f) |. Since L 2 is a space of square-integrable 
functions, each element / 6 L 2 has a finite norm. We say that / and g are orthogonal if (/, g) = 0. 
If, additionally, ||/|| = ||g|| = 1, we say that / and g are orthonormal. Both definitions are related to 
vector spaces: a norm corresponds to the ’’distance” from 0 function, whereas orthonormal elements 
behave as perpendicular vectors. 

Hilbert space is called separable if we can find a series of pairwise orthonormal elements ei, e 2 , e^, ... 
such that each element in e E L 2 can be expressed as a weighted sum of elements e\, e 2 , 63 ,.... This 
series {ej}i<j is called a basis, and each e* is a basis function. 

One can show, that given the basis {ej}i<j, the representation of / £ L 2 is uniquely given by 

OO 

f = ^2{ei,f)ei, ( 1 ) 

i=l 

where scalars (e*, /) are called coefficients in the basis {ej}i<j. 

We may find infinitely many basis of L 2 . As an example, often used in practice, consider the 
Fourier series, defined as 

' sin(kTrx), if i = 2k + 1 
cos(/c7rx), if i = 2k 

where k 6 Z and i > 1. Several first elements are presented in Figure |4j A proof that elements are 
orthonormal is a simple exercise. The proof that each element in / can be uniquely expressed as a 
linear combination ([!]) is more complicated and an interested reader is referred to [13]. 

We will use, one of the fundamental equations in separable Hilbert spaces, a Parseval’s identity. 

Lemma 1 (Parseval’s identity). Let f 6 L 2 and let {ej}i<, be an orthonormal basis in L 2 . Then 


e-i(x) = 




2—1 
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Figure 4: First 5 Fourier basis functions in the order black, red, green, blue and light blue (left). 
Step function approximated by a weighted sum of 1, 2,3 and 4 basis functions (right). 


Proof. It follows from the definition of the norm 


{f, f) = ( X>’ /) e *’ f) e i 

\ 1=1 i= 1 

oo / oo \ 

2=1 \ j = l / 

OO OO 

i =1 j=l 

OO 

2—1 


because (ej, f)(ei, ej) = (e*, /) if i / j and 0 otherwise, due to orthonormality of the series {ej}i<j. 

□ 


We will also use the notion of linear operators and Hilbert-Schmidt operators. A linear operator 
F : Lo —> L 2 is a function such that for any pair of scalars a,b and elements v,w E L 2 , 
F(av + bw) = aF{y) + bF(w). A Hilbert Schmidt-operator F is a linear operator, such that 

OO 

H F ( e i)H 2 < °°> 

2—1 

where is an orthonormal basis of L 2 . An operator is symmetric if F[v) = F(—v) for each 

v E L 2 . 
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1.3 Representation and fit 


In practice we are often given just a sample of observations from a curve and we need to interpolate, 
i.e. draw a curve between these points which is most likely to be close to the underlying process. 

In this work, we follow the ideas popularized by Ramsey and Silverman [12| . based on a basis 
function expansion Note that, given the representation [TJ by the Parseval’s identity, for any 
e > 0, there exists d, such that 

OO 

^2\{ei,f)\ 2 < s. 
i=d 

We can therefore approximate the function with arbitrary precision e > 0 using only the first d basis 
elements. 

In practice, inner products, which are typically obtained by integration, will be themselves 
approximated by corresponding sums. Then, a discretized sample curve ( x(tj ): 1 < j < n) can be 
transformed into a (finite dimensional) curve y(t) through 

din \ 

y(t) '■= ~ ^-i) <*(*)» 

i=i \j =i / 

for some grid 0 < t\ < t 2 < ... < t n < 1, where the expression in brackets accounts for the 
approximation of the inner product between / and e*. 

Fitting and representation of functional data is an important and intensively studied topic on 
its own, however, in this article we assume that datapoints are fully observed , i.e. we observe the 
whole curves, instead of just a sample of points. For more information on fitting we refer to [121 . 

1.4 Statistics in Hilbert spaces 

Random function is the key concept used in the sequel. We can think of it as an extension of a 
random value or a random vector from R 2 . Instead of drawing a random point from a plane, we 
draw a whole function from all functions in L 2 . In this section we introduce a mean function and 
a covariance operator. We will extend the concept of expected value of a scalar variable to vectors 
and functions. 

In the space R 2 , we refer to a mean as an expected value on each coefficient, for example, having 
a random vector X = (X\. W>), we define a mean of this variable as EX = (EX 1 , EX 2 ). 

Similarly we can look at a random function X. We can define it’s mean as a mean on each 
coefficient. Let’s take a fixed basis and an expansion of X, given by 

OO 

X = Y J (zuX)e i . 

1=1 
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Then, random variables (ei,X) correspond to coefficients, so a mean function can be defined as 


EX = 

2—1 

where we dehned the expectation of random function as a sum of scalar expectations E(ei, X). Note 
that, as in scalar case E{ei,X) may not be finite and then also EX will not exist. Moreover, in 
order to have EX £ L2, we need the series (E(ei, X))i<i to be square summable. 

Defining covariance in functional space is more complicated and, in order to show a direct 
relation, we will recall a non-standard representation of a covariance in multidimensional space. 
One of the ways to look at a covariance of a random vector X in R 2 , is to see it as a linear function 
R 2 —* R 2 , defined as 

C(v) = E(X,v)X, 

where v £ R 2 . We have C(v) = (EXX')v (and we refer to the variance as C = £XX'). Note, that 
since (X, v) is a scalar, (X,v)X is a random vector and therefore we just use the definition of the 
mean vector. 

In the functional space, we will define a covariance operator as an operator L 2 —>■ L 2 . For a 
random element X in L 2 , similarly to the vector case, we take the expectation of (X, v)X, i.e. 

C x (f) = E(X,f)X, where f £ L 2 . 

We call (X, • )X, an outer product and write X ® X = (X, ■ )X. Again, we consider only such 
random functions X that Cx exists. One can show that the operator Cx is a positively definite, 
symmetric Hilbert-Schmidt operator and therefore it can be inversed. 

2 Functional Time Series 

In many practical situations functions are naturally ordered in time. For example, when we deal 
with daily observations of the stock market or with sequences of tumor scans. Then, we are in the 
context of so-called functional time series (FTS). 

As a motivating example consider Figure [5} Here, the assumption of independence can be too 
strong - values at the beginning of each day are highly correlated with those at the end of the 
preceding day. Moreover, we see that big jumps are often followed by significant drops. 

These, and similar features, may indicate significant temporal dependence not just within a 
subject, but also between different subjects (e.g. days). In this section we discuss possible frameworks 
which allow to quantify, and use this additional information. 

2.1 Stationarity 

Many physical processes are known to have time-invariant distribution. This motivates the frequen- 
tist approach to time series, where we assume that the structure does not change in time and we 
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Figure 5: Horizontal component of the magnetic field measured in one minute resolution at Honolulu 
magnetic observatory from 1/1/2001 00:00 UT to 1/7/2001 24:00 UT. 1440 measurements per day. 

interfere from estimated covariances. 

Let {Xt} be a series of random functions. We say that {A/} is stationary in the strong-sense if for 
any h E Z ,k E N and any sequence of indices t\, t 2 , ..., tu vectors (X t] ..... X tk ) and (Xh +tl , ..., Xh +tk ) 
are identically distributed. 

We also define weak stationary by looking only on the second order structure of the series. We 
say that { X t } is weakly stationary if i?||A/|| 2 < oo and 

1. EXf = EX o for each t E Z and 

2. EX t <g> X s = EX t -s <8> Xq for each t,s E Z. 

Additionally, we will assume that a series in the sequel are weakly dependent , which intuitively 
means that observations from the far past have little to no effect on the present. Many frameworks 
were suggested to quantify this behavior, for a survey on most popular ones refer to |7J. For simplicity 
in this work we just assume a very strong condition, of a weakly dependent series as a stationary 
series for which 

OO 

"y ' E 11 Xf -X"o |[ w < oo, 
t =o 

meaning that the covariance between far elements decays very fast. 

2.2 Functional linear regression 

One of the most popular frameworks in classical statistics is the linear regression, where we try to 
quantify the linear dependence between two (possibly multivariate) variables X and Y. The problem 
of finding the relation of this type can be also addressed in FDA. As an illustrative example we can 
think of the relation between some farm’s income during the year (a function of time defined on a 
yearly interval) and precipitation over a year. 
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We assume the model 


Y t = A(X t ) + s t , t> 1 


( 2 ) 


where A is a linear Hilbert-Schmidt operator from L 2 —> L 2 and Et is a sequence of independent 
identically distributed functions white noise sequence, independent from (Xt). 

As we are concerned with functional time series, we will assume that Xt, Yt are weakly stationary 
and weakly dependent. Classical case of iid Xt is of great scientific interest and the interested reader 
is referred to m and m- 

Although the functional linear regression shares many properties with its multivariate equivalent, 
there are important theoretical difficulties, which preclude direct extension of the results from the 
simpler setup. Especially, we note that the linear operator A : L 2 —>■ L 2 is infinitely dimensional, 
which considerably complicates the estimation. If we approach the problem in the classical way by 
multiplying both sides of ([ 2 ]) by Xt and taking the expectation, for t > 1 we have 

EY t ® X t = EA(X t ) ® X t + Ee t ® X t = EA(X t ) ® X t = A(EX t ® X t ), 

by independence of X t and Ef Now, for convenience, let’s denote this by 

C XY = AC X , (3) 

where C x * is the cross-covariance operator of X and Y and C x is the covariance of X. Now, the 
natural way to obtain A is to apply the inverse of C x to both sides of the equation ([ 3 ]) , which yields 

A = C xy {C x )-\ 

The main problem is that the operator ( C x )~ 1 is no longer bounded. Indeed, the domain of C x is 
only a subset D , say, of L 2 . To see this, note that formally, as the inverse of C x is a linear operator, 
we may express {c x y ' 1 (x) = 1 Yki x )&ki where A*, and e*, are the eigenvalues (tending to 

zero) and eigenfunctions of C x . Hence, D = {x € T 2 : Ylk>i( x > e k) 2 Y k 2 < 00 }. The problem 
can be approached by some regularization. E.g. one may replace ( C x )~ 1 by a finite dimensional 
approximation of the form Ylk<K^k le k ® e*,, where K is a tuning parameter. This is still quite 
delicate, when applied to the sample version. Then for large values of K, if we underestimate one 
of the small eigenvalues, its reciprocal explodes and will lead to very instable estimators. On the 
other hand, for small K we may get a very poor approximation of A. 

This difficulty was addressed by Bosq [2], who gives an extensive survey on the problem. However, 
proposed results are based on strong assumptions on the rate of convergence of eigenvalues, which 
are impossible to check in practice. Alternative, elementary data-driven approach, was suggested 
in 0. 

Finally, note that exactly the same technique can be used for lagged linear regression, i.e. where 
the response Yt depends linearly not only on the current observation Xt but also on the whole series 
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Xf . Consider 


Y t = Y J MXt-k) + £u (4) 

k =0 

where m G N. Again, as an example, we can think of it as an income during a given year based on 
precipitation in last 3 years. 

Let m be the largest lag that we want to take into account and let us introduce Zt = ( Xt , Xt- 1 , Xt - m ) G 
L™. One can easily show that the space L™ is a Hilbert space. Then, the model can be written as 

Yt = BZt + £j, (5) 

where B : L™ —> L 2 is a linear operator such that BZt = Y2k=o 

Now, for estimating B in ([ 5 ]), we can apply the same estimation procedures as in ([ 2 ]). This 
method of estimation in lagged regression models is efficient only for small dimensions and small m, 
as opposed to the technique briefly introduced in the following section. 

2.3 Frequency-domain methods 

The lagged linear model Q can be linked with the concept of linear filtering, popular in multivariate 
time series as well as in signal processing. For the theory and survey on applications in this context 
we refer to the classical book of Oppenheim and Schafer m- 

Definition 1. We say that A = {A k }kez is a linear filter if for each k G Z, A k G L 2 —> L 2 is a 
linear operator and ^ \\A t \\ 2 < 00 . 

In order to find a method for estimation of operators At in Q, in a more efficient way than in 
, we employ Fourier analysis and results from the seminal work of Brillinger [3] . 

The Fourier transform has two important properties which simplify analysis of the process <©■ 

First, multiplication in the frequency domain is equivalent to convolution in the time domain. 

Second, the Fourier transform is a bijection, so results in frequency domain are equivalent to these 
in the time domain. 

To illustrate the usage of these features let us multiply equation ([ 2 ]) by X s for some s G Z and 
take the expectation. By linearity we have 

EY t ®X S = Y, A k EX t -k ® X a , 
kez 

and by stationarity 



EY U <g> Xo — ^2 AkEX u _k ® Xu, 
kez 

where u = t — s. Now, noting that on left we have C\ x and on right we have the convolution of A k 
and C\ x , the Fourier transform of both sides yields the so-called cross-spectral operator between 
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{Y t } and {X t } and can be obtained as 


Fj x = A(0)F x , (6) 

where A(9) = Ylkez Ake lke is the frequency response function of the series {Ak}kez, Xg x = 
Ylkez(C X ) k e- 1Lke is the spectral density operator of {Yt} and {Xt} and J is the spectral density 
operator of {X t }. 

Having relation ([b]), again we can invert J- x and obtain a closed-form expression for A(0). 
Now, the inverse Fourier transform gives us coefficients of the model ©■ Note that, although we 
employ more sophisticated tools than in linear regression ([3]), symbolically approach presented here 
is analogical, but developed in the frequency domain. 
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